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<^ ■ Abstract 

We consider data transmission through a time-selective, correlated (first-order Markov) Rayleigh 
fading channel subject to an average power constraint. The channel is estimated at the receiver with a 
pilot signal, and the estimate is fed back to the transmitter. The estimate is used for coherent demodu- 
lation, and to adapt the data and pilot powers. We explicitly determine the optimal pilot and data power 
control policies in a continuous-time limit where the channel state evolves as an Ornstein-Uhlenbeck 
diffusion process, and is estimated by a Kalman filter at the receiver. The optimal pilot policy switches 
between zero and the maximum (peak-constrained) value ("bang-bang" control), and approximates 



> 

the optimal discrete-time policy at low Signal-to-Noise Ratios (equivalently, large bandwidths). The 



OO 



X 



switching boundary is defined in terms of the system state (estimated channel mean and associated 



error variance), and can be explicitly computed. Under the optimal policy, the transmitter conserves 
power by decreasing the training power when the channel is faded, thereby increasing the data rate. 



Numerical results show a significant increase in achievable rate due to the adaptive training scheme 
with feedback, relative to constant (non-adaptive) training, which does not require feedback. The gain 
is more pronounced at relatively low SNRs and with fast fading. Results are further verified through 
Monte Carlo simulations. 

Index Terms 

Limited-rate feedback, Gauss-Markov channel, channel estimation, adaptive training, wideband 
channel, diffusion approximation, free boundary problems, Bang-Bang control, Variational Inequalities. 
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I. Introduction 

The achievable rate for a time-selective fading channel depends on what channel state in- 
formation (CSI) is available at the receiver and transmitter. Namely, CSI at the receiver can 
increase the rate by allowing coherent detection, and CSI at the transmitter allows adaptive rate 
and power control (e.g., see [1, Ch. 6]). Obtaining CSI at the receiver and/or transmitter requires 
overhead in the form of a pilot signal and feedback. 

We consider a correlated time- selective flat Rayleigh fading channel, which is unknown at both 
the receiver and transmitter. The transmitter divides its power between a pilot, used to estimate 
the channel at the receiver, and the data. Given an average transmitted power constraint, our 
problem is to optimize the instantaneous pilot and data powers as functions of the time-varying 
channel realization. Our performance objective is a lower bound on the achievable rate, which 
accounts for the channel estimation error. 

Power control with channel state feedback, assuming the channel is perfectly known at the 
receiver, has been considered in [2]-[6]. There the focus is on optimizing the input distribution 
for different channel models using criteria such as rate maximization and outage minimization. 
Optimal power allocation in the presence of channel estimation error has been considered in 
[7], [8]. The problem of optimal pilot design for a variety of fading channel models has been 
considered in [9]— [15]. There the pilot power and placement, once optimized, is fixed and is not 
adapted with the channel conditions. A key difference here is that the transmitter uses the CSI 
to adapt jointly the instantaneous data and pilot powers. Because the channel is correlated in 
time, adapting the pilot power with the estimated channel state can increase the achievable rate. 
We also remark that although we analyze a single narrowband fading channel, our results apply 
to a set of parallel fading Gaussian channels, where the average power is split over all channels. 

We start with a correlated block fading model in which the sequence of channel gains is Gauss- 
Marko\|3 with known statistics at the receiver. The channel estimate is updated at the beginning 
of each block using a Kalman filter, and determines the power for the data, and the power for the 
pilot symbols in the succeeding coherence block. Optimal power control policies are specified 
implicitly through a Bellman equation [20]. Other dynamic programming formulations of power 

'Several theoretical and measurement based studies, such as [16]-[19], have argued that this is a reasonable model for 
time-selective wireless channels. 
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control problems have been presented in [6], [21]-[24], although in that work the channel is 
either known perfectly (perhaps with a delay), or is unknown and not estimated. 

Because an analytical solution to the Bellman equation appears to be difficult to obtain, we 
study a diffusion limit in which the correlation between successive coherence blocks tends to one 
and the average power goes to zero. (This corresponds to a wideband channel model in which 
the available power is divided uniformly over a large number of parallel flat Rayleigh fading 
sub-channels.) In this limit, the Gauss-Markov channel becomes a continuous-time Ornstein- 
Uhlenbeck process [25], and the Bellman equation becomes a partial differential equation (PDE). 
A diffusion equation is also derived, which describes the evolution of the state (channel estimate 
and the associated error variance), given a power allocation policy. In this limit, we show that 
given a peak power constraint for the pilot power, the optimal pilot power control policy is 
a switching policy ("bang-bang" control): the pilot power is either the maximum allowable or 
zero, depending upon the current state. Hence the optimal pilot power control policy requires 
at most one feedback bit per coherence block. Also, the optimal data power control policy is 
found to be a variation of waterfilling [1]. Other work in which the wireless channel is modeled 
as a diffusion process is presented in [23], [26]. 

The switching points for the optimal policy form a contour in the state space, which is 
referred to as the free boundary for the corresponding PDE. Solving this PDE then falls in the 
class of free boundary problems [27], [28]. We show that in the diffusion limit the system state 
becomes confined to a narrow region along the boundary. Furthermore, the associated probability 
distribution over the boundary is exponential. That enables a numerical characterization of the 
boundary shape. 

Our results show that the average pilot power should decrease as the channel becomes more 
severely faded. We observe that the optimal switching policy is equivalent to adapting the pilot 
symbol insertion rate with fixed pilot symbol energyo The optimal pilot insertion rate as a 
function of the channel estimate is then determined by the shape of the free boundary. We show 
that the boundary shape essentially shifts pilot power from more probable (faded) states to less 
probable (good) states. Furthermore, the boundary shape guarantees that the channel estimate is 

Alternatively, the same performance can be achieved by fixing the pilot insertion rate and varying the pilot power. However, 
in principle that would require infinite-precision feedback in contrast to bang-bang control, which requires one feedback bit per 
coherence block. 
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sufficiently accurate to guide the power adaptation. 

Numerical results show that pilot power adaptation can provide substantial gains in achievable 
rates (up to a factor of two). The gains are more pronounced at low SNRs and with fast fading 
channels. Although these results are derived in the limit of large bandwidth (low SNR), Monte 
Carlo simulations show that they provide an accurate estimate of the performance when the 
bandwidth is large but finite (a few hundred coherence bands). Moreover, the optimal switching 
policy in the diffusion limit accurately approximates the optimal pilot power control policy for 
the discrete-time model, and provides essentially the same performance gains relative to constant 
pilot power. 

To limit the overall feedback rate, we also consider combining the adaptive pilot power with 
"on-off" data power control, which also switches between a fixed positive value and zero. (Hence 
that also requires at most one bit feedback per coherence block.) The corresponding optimal free 
boundaries are computed, and results show that this scheme gives negligible loss in the achievable 
rate. 

The next section presents the system model and Section [HI] formulates the pilot optimization 
problem as a dynamic program. Section [IV] presents the associated diffusion limit and the 
corresponding Bellman equation. The optimal policy is then characterized in Sections IVllVHI 
with optimal data power control, and in Section IVIIII with optimal on-off data power control. 
Numerical results showing free boundaries and the corresponding performance are also presented 
in Sections IVIII and IVIIII Training overhead is discussed in Section [DD and conclusions and 
remaining issues are discussed in Section [X] 



We start with a block fading channel model in which each coherence block contains M 
symbols, consisting of T pilot symbols and D data symbols. The vector of channel outputs for 
coherence block i is given by 



where S^t and Sj are, respectively, vectors containing the pilot and data symbols, each with 
unit variance, and P^ T and Pj are the associated pilot and data powers. The noise Zj contains 
circularly symmetric complex Gaussian (CSCG) random variables, and is white with covariance 



II. Correlated Block Fading Model 




(1) 
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a 2 l. The channel gain hi is also CSCG, is constant within the block, and evolves from block to 
block according to a Gauss-Markov process, i.e., 



h i+ i = rhi + VI - r 2 Wi (2) 

where u>j is an independent CSCG random variable with mean zero and variance o~h 2 , and 
r £ [0, 1] determines the correlation between successive blocks. We will assume that r and a\ 
are known at the receiver. The training energy per symbol in block i is defined as = aP i; T, 
where a = T/M. In what follows, it will be convenient to write P^t as e^/a. 

The receiver updates the channel estimate during each coherence block with a Kalman filter 
[29], given the model ([2]) and the pilot symbols, and relays the estimate back to the transmitter. 
The feedback occurs between the pilot and data symbols, and is assumed to occupy an insignif- 
icant fraction of the coherence time. We re- write the noise vector Zj as [z]. T zt]t where z i]T and 
Zj are, respectively, T x 1 and D x 1 vectors and denotes Hermitian transpose. The channel 
estimate h\ and estimation error 9{ = i?(|/ij| 2 ) — E(\hi\ 2 ) evolve according to the following 
Kalman filter updates: 



h i+ i = rhi + g i+ i\/ ( ^Tei + i\i + g l+ iS\ +1 . T z i+ i; T (3) 



a 2 9 lA 



6jM 6> i+ iu + cr 2 2 



where 



9% = \ o (5) 

e i+ i\i = h i+ i -rhi (6) 

9 i+1]i = r 2 e t + (l-r 2 )a h 2 . (7) 

It is straightforward to show that the channel estimate hi in © does not depend on T. (What 



is important is the total pilot energy per coherence block.) Hence the data rate is maximized by 
taking T = 1 with fixed q (i.e., the training power P i; T = e^M). Training therefore requires an 
overhead of 1/M fraction of the channel uses. We ignore this overhead for the time being and 
focus on optimizing the training power e,. This issue is revisited in Sec. IS 

We wish to determine Pi and Cj, which maximize the achievable rate. Specifically, the channel 
estimate hi and variance 9i determine the data power in the current coherence block, P h and the 
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pilot power in the next coherence block, e i+1 . We assume that the transmitter codes over many 
coherence blocks, and use the following lower bound on ergodic capacity, which accounts for 
channel estimation error, as the performance objective [11], [30], 

Pi fa 



R(P { , fa, 9 { )= log 1 



(8) 



Pi9i + <T g * 

where fa = \hi\ 2 . In the next section, we formulate the joint pilot and data power optimization 
problem, and subsequently characterize the optimal power control policy implicitly as the solution 
to a discrete-time Bellman equation. 

III. Dynamic Programming Formulation 
The pilot and data power control problem can be stated as 



max liminf — E 

{Pi,et} n-^oo n 



subject to: lim sup E 

n— >oo 

and e, < e r 



71-1 

E 

.i=0 

71-1 



R{Pii fa-, 9, 



i=0 



i=0 



< Pn 



(9) 



where the expectation is over the sequence of channel gains. We have imposed an additional 
peak power constraint on the training power. This is a discrete-time Markov control problem, 
so that the solution can be formulated as an infinite-horizon dynamic program with an average 
value objective. The system state at time (block) i is Si = (fa, 9j), and the action maps the state 
to the power pair (P u t i+ i). To see that Si is the system state, note that e i+ i\i in © and hi are 
independent random variables, hence it follows from ([3]) and (HJ) that the probability distribution 
of Si+i is determined only by Si and the action e i+1 . The process {(fa, 8i)} is therefore a Markov 
chain driven by the control {ej}. 

The average power constraint in © can be included in the objective through a Lagrange 
multiplier giving the relaxed problem 



max lim inf —E 

Pi,0<ei<e max n^oo 77, 



n-1 



[R(Pi,fa,9i)-X(e i + P i )] 



i=0 



(10) 



where A is chosen to enforce the constraint ©. If there exists a bounded function V(fa 9) and 
a constant C, which satify the Bellman equation 



V(fa8) + C= max [R(P, fa 9) - A (e + P) + E e , m [V]} 



(11) 
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then an optimal policy maximizes the right-hand side [20]. The function V(-, •) is called an 
"auxiliary value function", and C is the maximum value of the objective in (flOl) . The expectation 
E e ,(fi,e)[') is over the conditional probability of S^+i given Si = ((1,9) and action e;+i = e. 
Using the channel state evolution equations derived in Section UH we have 

POO 

Ei, m [V] = / V(uA+i) h i+ A Si (u)du (12) 

where fp, i+1 \Si(u) is the conditional density of (x i+ i = \h i+1 \ 2 given Si = 6>j) = (p,9), and 
9 i+ i is given by ©. From © it follows that fp, i+1 \Si(u) is Ricean with noncentrality parameter 
r 2 (i and variance [(0j + iueM)/af]0j + i + r 2 (i, where 6> i+1 and ftj+iij are given by (HI) and ©, 
respectively. 

IV. Diffusion Limit 

The Bellman equation (fTTI) is an integral fixed point equation, and appears to be difficult to 
solve analytically. To gain insight into properties of optimal policies, we consider the following 
scaling, corresponding to a low SNR regime: 

1) Time is scaled by the factor 1/N, where N M is large, so that each coherence block 
of M symbols corresponds to St — time units. Therefore one time unit in the scaled 
system contains j[ = jj coherence blocks, or equivalently N channels uses. 

2) The correlation between adjacent coherence blocks is r = 1 — p(5t), where p is a constant. 
Hence this correlation goes to one as N — > oo (equivalently, the channel coherence time 
goes to zero), but with fixed correlation between blocks separated by N channel uses. 

3) To maintain constant energy over N channel uses, the average power P av , data power Pi, 
training power e, and the maximum training power e max are each scaled by 1/N. 

In the limit as N — ► oo, it can be shown that the discrete-time, complex, Gauss-Markov process 
{hi}, given by (O, converges weakly to a continuous-time Ornstein-Uhlenbeck diffusion process 
h(t) (e.g., see [31, Ch. 8]). Furthermore, the limiting channel process satisfies the stochastic 
differential equation (SDE) 

dh(t) = -ph(t)dt + y/2pa h dB(t), (13) 

where B(t) is complex Brownian motion, and we assume that the initial state h(0) is a CSCG 
random variable with zero mean and variance a\. This is a Gauss-Markov process, which is 
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continuous in probability, and has autocorrelation function 

$(r) = a h 2 e- pr , (14) 

where r is the lag between the time samples of the channel. Hence p determines how fast the 
channel varies relative to the symbol rate. For example, the end-to-end normalized correlation 
across an interval of r = 1 (or, equivalently N channel uses) is e~ p . 

The diffusion limit considered can be interpreted as zooming out on the channel and associated 
data transmission. Segments of the discrete channel process then become "compressed" in 
time, but with increasing correlation between successive coherence blocks so that the chan- 
nel autocorrelation remains fixed. Prior work, which advocates the use of diffusion models 
for wireless channels is presented in [23], [26]. In this limit the Kalman filter continuously 
estimates the channel process, and the pilot and data powers are continuously updated based on 
continuous feedback. (We will see that to achieve optimal performance the feedback need not 
be continuous.) The optimal power control policy in the diffusion limit can then be interpreted 
as an approximation for the optimal discrete-time policy. (This will be illustrated numerically.) 

The power scaling by 1/N can be interpreted as introducing N parallel, independent and 
statistically identical sub-channels over which the power is equally split. Hence the low SNR 
regime corresponds to a wideband channel^ 

In the diffusion limit the channel estimate and estimation error updates given by © and ©, 
respectively, become the dynamic equations 



dh(t) = -ph(t)dt + 6(t)\l^l dB(t), (15) 

— = 2p(a h -9(t))-—^, (16) 

where B(t) is a complex Brownian motion independent of B(t), and e(t) is the pilot power at 
time t. A heuristic derivation of (fl"3T) . (fl"5l) and (fT6l) from the discrete-time equations ©, ([3]) and 
© is given in Appendix HI 

Note that both hit) = h r (t) + j hj(t) and h(t) = h r (t) + j hj(t) are complex. The following 
SDE defining the evolution of the channel estimate p{t) = \h{t)\ 2 = h^^ + h 2 ^) can be obtained 

3 This bandwidth scaling is simply an interpretation of the effect of the power scaling. The diffusion process is still associated 
with a. flat Rayleigh fading channel. 
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from (fl"5l) and a straightforward application of Ito's Lemma [25, Ch. 4] 

e{t)6\t) 



dp,(t) 



-2p£(t) + 



dt + 2 



dB r (t) + 2 



(17) 

where B r (t) and -Bj(t) are, respectively, the real and imaginary parts of the complex Brownian 
Motion B(t). We can re-write (fT7l) as 



dfi{t) 



-2pA(*) + 



e(t)c? 2 (t) 



0" 2 



i{t)(i{t) 



dB(t), 



(18) 



where <iB(£) is a rea/-valued standard Brownian motion. 

Lemma 1: Given e{t) E [0, e max ), the state process 5(t) = (jj,(t),9(t)), which is the solution 
to the stochastic differential equations (fl"8l and ([Tot , has continuous sample paths. 

The proof is given in Appendix HO Note that this Lemma does not require the control input 
e(t) to be continuous in time. This observation will be useful in the subsequent discussion. 

We now consider the continuous-time limit of the optimization problem ©. If the data 
power for the i th discrete coherence block is Pi/N, then for large (but finite) N, the objective 
R(Pi/N, jii,0i) becomes close to the continuous objective R[P(t)/N,ji(t),9(t)], where R(-) 
is given by © and the index % corresponds to time t. Our problem is then to choose e(t) 
and P(t), as a function of the state (fi(t),6(t)), to maximize the accumulated rate function 
(over time), averaged over the channel process h(t). Equivalently, we can maximize the scaled 
objective N R[P(t)/N, jj,(t),9(t)} (corresponding to the sum rate over N parallel sub-channels). 
A difficulty is that this objective is unbounded as N — ► oo. To simplify the analysis, we first 
take N = 1, which lower bounds the objective for all iV > 1 (and corresponds to scaling up 
the power in the diffusion limit). After characterizing the optimal policy we then replace the 
objective with the preceding scaled objective with fixed N to generate numerical results 

We therefore rewrite the discrete-time optimization © as the continous-time control problem 



max liminf —E 

(P(t),e(i)) t^oo t 

subject to: lim sup E 

t— »oo 



R(P(t),jj,(t),6(t))dt 



e(t) dt+- 
t 



P(t) dt 



< Pn 



(19) 



and 



it) < e, 



4 In Sec, |VIlFA| we discuss the rate of growth of the scaled rate objective as N — > oo. 
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Analogous to (fTTI) . the Bellman equation can be written as (see [32]) 

C= max {R(P,p,6)-X(e + P)+A e [V(jl,6)]} (20) 

P, 0<e<e ma x 

where A € is the generator of the state process (p(t), 9(t)) with pilot power e(t) [25, Ch. 7], and 
is given by 

A e[ V} = ^p = a + eb (21) 



where 



dV dV 

— (-2pp) + — (-2p6 + 2pa h 2 ) (22) 



2 



dv dv A a 2 v 

+ ti- 



er-- \_dp 89 n dp? 
and the dependence on t is omitted for notational convenience. Here we ignore existence issues, 
and simply assume that there exists a bounded, continuous, and twice differentiable function 
V(-, •) satisfying ([201). Note that V(-, •) is unique only up to a constant [20, Ch. 4], [32]. 

Theorem 1: Given the pilot power constraint e e [0,e max ], the optimal pilot power control 
policy is given by 

tmax if b — X > 

(24) 

otherwise. 

In words, optimal pilot power control is achieved by a switching (bang-bang) policy. This 
follows immediately from substituting the generator A e , given by ([2T])-([23]), into (1201) . i.e., 

C = J(/t, 9, A) + max [a + e(b - A)] (25) 

e 

where J(p,9,X) = maxp [R(P,p,9) — XP]. Substituting (|24"1) into ((251) gives the final version 
of the Bellman equation 

C = J(p,9,X) + a + e max (b-X) + (26) 

where (x) + = max{0,x}. An alternative way to arrive at d25l) and d26l) is to take the diffusion 
limit of the discrete-time Bellman equation (fTTI) . This alternative derivation is given in Appendix 

It is easily shown that the optimal data power allocation is 

P d (p, 9, X) = arg max [R(P, p, 9) - XP] (27) 



-Xa z *(29 + p) + VAY (2g) 



2X9(p + 9) 
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where A = A 2 p 2 cr/+4A p 2 9a 2 +A9 2 p\a 2 and A determines P av in (HU). Note that P d (fi, 9, A) > 
for jl > \a 2 . This power allocation is the same as that obtained in [7], which considers a 
fading channel with constant estimation error, as opposed to the time-varying estimation error 
in our model. 




Fig. 1. Illustration of the dynamics of the optimal switching policy for pilot power control. 



V. Behavior of the Optimal Policy 

From Theorem Q] the optimal pilot power control policy is determined by the switching 
boundary in the state space (/t, 6), which is defined by the condition b = A. This is referred to 
as a "free boundary" condition for the Bellman PDE ([201) [27], [28]. 

The dynamical behavior of the optimal pilot power control policy is illustrated in Fig \T\ The 
vertical and horizontal axes correspond to the state variables p, and 9, respectively. The shaded 
region, D e , is the region of the state space in which e = e max , and e = in the complementary 
region D . These two regions are separated by the free boundary, AC. The penalty factor A 
determines the position of this boundary, and the associated value of P av . 
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Fig. 2. Trace of the state (fi, 9) obtained by simulating d!8t and d!6l l with bang-bang control. Parameters are M = 1, N = 200, 
so that dt = M/N = 0.005, and e max = 12(10.8dB),p = 2, a\ = \,a\ = 1. A higher density of dots (darker regions) 
corresponds to higher steady-state probabilities. Also shown is the free boundary computed via the diffusion model. The state 
lies along the free boundary and the 9 — 9* line. Also, the probability decreases with /}. 

The vertical line A' A" in the figure corresponds to the estimation error variance 9*, which 
results from taking e = e max for all t. Clearly, in steady state the estimation error variance cannot 
be lower than this value, hence the steady-state probability density function (pdf) of the state 
(/2, 9) is zero for 9 < 9*. Substituting e = e max in (fT6l) and setting = gives 



Suppose that the initial state is in D . With e(t) = the state evolution equations (fl"8l) and 
(fT6l) become dfi{t) = —2pfi{t)dt and d9(t)/dt = 1p{o\ — 9{t)). This implies that the state 
trajectory is a straight line towards the point Z until it hits the free boundary, as illustrated in 
Fig. CD Therefore, for P av > 0, A must be selected so that the point Z lies in D e . Otherwise, the 
state trajectory eventually drifts to Z and stays there, corresponding to e = for all t, P av = 
(because fi = 0), and R — 0. If the trajectory hits the free boundary below the point B, then 
it is pushed back into D . This is because at the boundary e(t) = e rnax and for 9(t) > 9*, the 
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drift term in QU) is negative, namely, dO/dt = 2p(a h 2 - 6{t)) - tmax 2 {t > < 0. Otherwise, if 
the trajectory hits the boundary above point B, it continues into D e and settles along the line 
A' A" where the drift dd/dt = 0. For the discrete-time model with small, but positive St, the 
state trajectory zig-zags around the boundary, as shown in Fig. \T\ Hence if the free boundary 
AC intersects A' A" at point B, then in steady state the probability mass must be concentrated 
along the curve ABC. This is verified through Monte Carlo simulations and illustrated in Fig. 
[21 Points in the state space are shown corresponding to a realization generated from (fl"8l) and 
([16]) with M = 1, N = 200 (so that 5t = M/N = 0.005). 

The preceding discussion suggests that the steady-state probability associated with states not 
on the curve defined by the free boundary and 6 = 6* tends to zero in the continuous-time limit. 
This is stated formally in the next section. We also remark that in region D the PDE (1261) is a 
"transport equation" [27], which has an analytical solution containing an arbitrary function of a 
single variable. Determining this function and the constant C appears to be difficult, so that we 
will take an alternative (more direct) approach to determining the free boundary. 

VI. Steady State Behavior With Switching Policy 

In this section we characterize the steady-state behavior of the state trajectory with the optimal 
switching (bang -bang) training policy, and compare with some simpler policies. In particular, 
we give the first-order pdf over the free boundary, which we subsequently use to compute the 
optimized boundary explicitly. 

We will denote the free boundary as 6 e {(l) for ft > 0. To simplify the analysis we make the 
following assumptions: 

(PI) The free boundary 9 e (-) : [0, oo) — ► (6*,al) is a continuously differentiable curve 
such that 

d9 (x) 

o\ - 6 e {x) + V—^-- > for x > 0. (30) 
(P2) The function e (-) is one-to-one, i.e., for any x±,X2 > such that x\ ^ x-i, 6 e (x\) ^ 

6 e (x 2 ). 

Note that (PI) requires e max to be large enough so that the entire free boundary (AC in Fig. [D 
lies to the right of 6 = 6* . (That is, they do not intersect.) The condition (l30l) on the derivative 
of the free boundary curve is mild. Geometrically, it implies that the region enclosed by the 
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free boundary AC and point Z = (0, <r^) is convex. This condition is indeed satisfied by the 
optimized free boundaries computed in later sections. 

Proposition 1: Let the pilot power as a function of the state (/}(£), 9(t)) be given by 

... , e max if 0(t) > e (/2(t)) 
e(t) = { (31) 
otherwise. 



Then for any rj > 0, the solution to (1181) and (1161) satisfies 

limPr{|0(t)-0 e (/i(t))|>t7} = O. (32) 
The proof is given in Appendix ITVl The theorem implies that for large t the state (/t(t), 0(t)) 
moves along the free boundary { 6^ (/})}. Hence for the discrete-time system with large N, the 
state is typically confined to a narrow strip around the free boundary. |f 

Theorem 2: Given the pilot power control (|3T1) . the steady-state probability of training con- 
ditioned on the channel estimate pt, — u is 

p{u) = lim Pv{9(t) > emKt) =u}= 2fX £ [ f h ~ 9c{u) \ (33) 

and the steady-state pdf of the channel estimate p, is 

fv(u) = 2 1 exp(- f 2 1 ds ) , m>0. (34) 
The proof is given in Appendix |Vj From (1331) the average training power for the pilot power 



o 



control scheme can be computed as 

2P ^(^ ~ U u)) 
[9 e (u)Y 

Therefore if e max is large enough so that 9* < 9 e {u) for all u > 0, then neither the pdf /^(-) 
nor the average training power e avg depends on e max . This is because as e max increases, the 
probability of training, given by (1331. decreases so that the average training power given ft, 
namely e max p(p), remains unchangedj. In addition, we observe that //*(•) is independent of the 
correlation parameter p. 

5 This behavior of a controlled Markov process in which the initial state space reduces to a much smaller set under a certain 
class of control inputs is called "state space collapse" [33]. 

6 This ignores the overhead due to the insertion of training symbols. According to the subsequent discussion in Sec. IIXI this 
overhead is reduced by increasing e ma x- However, for the diffusion approximation to be accurate, e max ^- must be small, hence 
tmax cannot be too large. 



August 10, 2009 



DRAFT 



15 



Now consider the case in which the free boundary is constrained to be vertical, that is, 
6{jx) = 9 V , V/t > 0. This still corresponds to a switching policy, but where the variance of the 
channel estimation error is constrained to be a constant, independent of the channel estimate. 
From (|34|) the steady-state pdf of fx is exponential, i.e., fa(u) = a * exp( — tzt), and the 
average training power is e v = \$ -. 

Now consider the constant pilot power control policy, where e(t) = e v (constant) for all t. 
Substituting e = e v into (PT6l) and setting 4| = implies that the steady-state estimation error 
variance 0(t) = 9 V for all t. In addition, the steady-state pdf of ft is exponential with mean 
o\ — 9 V . Hence for a given average training power, constant pilot power can give exactly the 
same estimation error and steady-state pdf as the switching policy with a vertical boundary. Both 
schemes therefore achieve the same rate with a total power constraint (ignoring overhead due 
to pilot insertion). We will see that the optimized boundary is not vertical, which implies that 
adaptive pilot power control can perform better than constant pilot power. 

We also observe that the same performance as the optimal switching policy can be achieved by 
continuously varying the training power as a function of fi. Namely, taking e(/t) = 
gives the same steady-state pdf and training power as in (|34l) and (1351) . respectively. However, 
this scheme corresponds to feeding back the pilot power as a sequence of real numbers, which 
in principle require infinite precision. In contrast, the switching policy can be implemented by 
fixing the training power and varying the rate at which pilot symbols are inserted. The transmitter 
therefore does not need to know the exact value of the channel estimate. 

More specifically, the optimal switching policy inserts pilots of power e max with probability 
9 = 2 p°"z (gk^Mj ( or equivalently, once every 1/q coherence blocks) when the channel estimate 
/t = u. This requires at most one bit per coherence block to inform the transmitter whether 
or not to train in the next block. (Of course, the feedback can be substantially reduced by 
exploiting channel correlations.) The switching policy therefore requires fewer training symbols 
than continuous pilot power control, which requires a pilot symbol every coherence block. 

VII. Free Boundary with Optimal Data Power Allocation 

From the preceding discussion the optimal pilot policy is determined by the free boundary. 
Here we compute the free boundary by observing that this boundary must maximize the rate 
objective, assuming a switching policy for the pilot power. A difficulty is that the rate objective 
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of interest is NR(P/N, fx, 9), whereas the steady-state probabilities in Theorem [2] were derived 
in the limit as N — > oo. In what follows we use the asymptotic probabilities in Theorem [2] to 
approximate the steady-state probabilities corresponding to large but finite N. Simulation results 
have shown that the resulting free boundary is insensitive to the choice of N in the objective. 
Also, subsequent simulation results in Section IVII-CI show that the analytical performance results 
accurately predict the performance of the corresponding discrete-time model with the optimal 
switching policy when N is a few hundred. 

A. Analytical Solution 

With the preceding approximation for large but finite N the optimal free boundary can be 
computed as the solution to the following functional optimization problem, 

POO 

max / NR[P(u)lN,u,0£u)]ff{u)du 
p(u),e c {u) Jo 

POO POO 

subject to: / P(u)f fl (u)du+ e{u) f fi {u) du < P av , ( 36 ) 
Jo Jo 

and 9 e (u) > 9* for u > 0, 

where e(u) = ~^~t^^^~^ can be interpreted as the average training power when the estimate 
fx = u. The objective is the achievable rate averaged over the free boundary since for large N, 
the entire probability mass becomes concentrated on the boundary. 
To proceed, define the Lagrangian function as 

L\ [P, u,6] = NR(P/N, u,9) - A (p + 2pa ^- 6 A (3 7) 



Analogous to (1101) . the optimization problem (1361) can be re-stated as 

POO 

max / L\[P(u),u,9 £ (u)]ffi(u)du (38) 
p(u),e e (u) Jo 

such that 9 e (u) > 9* and A is chosen to satisfy the power constraint (|36l) with equality. It is 
shown in Appendix [VI] that the solution is given by 

9*(u) = max{0*,0 / (u)}, (39) 

where 6* is defined in ([291 ), and 9f(u) satisfies 



(a 2 h - 9 f (u))-^[NP d (u, f (u), A), u, 9 f (u)} + L x [NP d (u, 9 f (u), X),u, 9 f {u)} 



exp [-£^j^ds 



L x [NP d (v, 9 f (v), A), v, 9 f (v)} v 2 Q h . ? ' dv (40) 
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for u > 0, where 

d J±w u e] = ^ + 2X P^l~ e ) ( 4i) 

89 1 ' ' J (P6* + Pu + Naf){P9 + iVcr!?) ^ ^ ' 

and P rf (., ., .) is given by (|28T) . The optimal data power allocation is given by P*(u) = NP d (u, 0*(u), A). 

The condition (|40l) gives the value 6>j(m) as a functional of the free boundary Qf(x) for x > u. 
Hence we can compute the boundary numerically via a backward recursion provided that 9f(u) 
is known for large values of u. Moreover, it can be shown that 9f(u) is a decreasing function 
of u and as u — > oo, it converges to a constant value, that is, 9f(u) — > 9^. Taking the limit 
u — > oo on both sides of (l40l) . this value can be shown to satisfy 



^ V TT W +1 

Hence as long as e max is large enough so that 9* < 9^, the free boundary is given by 9f(u), 
independent of e max . 

Substituting u = into (l40l) . and using the fact that 9*{u) = 9f{u) for all u > (since 
< Boo), and P*(0) = 0, we obtain 



< ~ ^(0)] 2 _ 1 



*-P 

X a 



(43) 



[6*MY if** 

where R is the optimized objective in d36l) . Note that this relation depends on N only through 
the water-filling level A. Clearly, the left-hand side of (1431) is positive, which implies that we 
should have R > XP av - Also, as P av — > (low SNRs), R — > and A increases. Similarily, the 
right-hand side decreases to zero as p — > oo (fast fading), so that in both cases 0*(O) — » cr^. 



5. Numerical Approach to Free Boundary Problem 

The preceding approach to computing the optimal free boundary relies on the asymptotic pdf of 
the state in Theorem[2] Alternatively, it is possible to solve the continuous-time Bellman equation 
(|26l ) directly. This is potentially useful for other scenarios in which the steady-state distribution 
is more difficult to obtain. A challenge, however, is that the optimized free boundary is unknown 
a priori, i.e., it is obtained as part of the solution. Hence none of the standard numerical methods 
for solving PDEs, which rely on specified boundary conditions, can be directly applied. 

It is shown in Appendix IVIII that a numerical solution to the free boundary Bellman equation 
can be obtained by re-formulating the problem as a quadratic program. That method can be used 
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to obtain a solution to a general class of free boundary problems and does not require knowledge 
of the steady-state statistics. However, such a numerical computation has high complexity, and 
can be sensitive to parameter variations. In particular, the free boundary obtained from that 
method is often irregular (not smooth) due to discretization and finite-precision effects. 

C. Numerical Results 

Here we present some numerical examples of free boundaries obtained by solving the opti- 
mization problem (l36l) along with performance results. The analytical (diffusion) results are also 
compared with results from a Monte Carlo simulation of the discrete-time system. To solve (l36l) 
we discretize the pt, axis and also truncate it at a value Ut >> v\- For all of the results in this 
section a\ = a\ = 1, and e max = 15 (11.76 dB). The SNR is then = P av . 

a) Free Boundary Examples: Fig. [3] shows free boundaries corresponding to iV = 1000 
and p = 2 for different SNRs. The channel correlation with lag 1000 is therefore e -2 = 0.135, 
corresponding to relatively fast fading. (We abuse notation in this section by referring to fi as a 
particular realization of the channel estimate.) Also shown are the optimized vertical boundaries 
with a switching policy (i.e., 6{u) = 9 V , Vw > with optimized 6 V ). Recall that the performance 
with the vertical boundary is the same as training with a constant fraction of power, which results 
in the estimation error 9 V . For each boundary the data power allocation is given by the optimal 
water- filling power allocation in (1281) . 

The free boundaries are shaped so that the estimation error is larger for small values of /2, 
and smaller for larger values of fi. The reason for this is that the pdf of fi is larger when 
fi is close to zero where the instantaneous rate R(-) is small. (In fact, R = for fi < \a 2 z .) 
Allowing larger estimation errors for small fi (relative to the vertical boundary) therefore does not 
significantly reduce the overall ergodic rate, whereas it saves a significant amount of training 
power. Furthermore, shifting the savings in training power to larger values of (i reduces the 
estimation error for those values, thereby increasing the rate (since the rate increases with fi). It 
will be shown in Sec IVIII-AI that with an optimized on-off power allocation, for large enough 
N the achievable rate for the free boundary control depends on the shape of the free boundary 
only through the harmonic mean of the function o\ — 9 e (x). This can also be used to show that 
the boundary has the general shape shown in Fig. [3] 

Of course, the estimation error cannot be too large for small /t, since otherwise fx may decreases 
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to zero. (Note from (|34|) that as 9 e (p,) — > cr^, the density becomes concentrated around 

/t = 0.) Also, Fig. |3] shows that at smaller SNRs 9 € (0) is closer to a\, so that the free boundary is 
more skewed. The curvature of the boundary near ft = 4 is a numerical artifact due to truncation 
of the boundary at Ut = 5. Namely, this curvature disappears as Ut increases, since as discussed 
in the last section, 0/(/t) is a decreasing function of fi and approaches the value 9^ as (i becomes 
large. 




Fig. 3. Optimal free and vertical boundaries with the water-filling data power allocation (TV = 1000, p — 2). 



b) Gain in Achievable Rate: Fig. 0] compares the rate objective with the optimized free 
boundary to that achieved with the optimized vertical boundary at different SNRs. With p = 2 
(fast fading) these results show that optimized pilot power control gives substantial gains at low 
SNRs (e.g., a factor of two at an SNR of 3 dB). The percentage gain diminishes with the SNR. 
With p = 0.5, corresponding to a correlation of 0.6 with lag 1000, the gain in achievable rate is 
relatively small. The optimized free boundary gives the most gain at low SNRs and fast fading, 
since in that region the training power constitutes a larger percentage of the total power budget. 

c) Comparison with Monte Carlo Simulations: Fig. |4] also compares the analytical results 
from the diffusion model with the performance of the original discrete-time model obtained from 
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Fig. 4. Achievable rate versus SNR with optimal free and vertical boundary pilot power control. 



Monte Carlo simulations. Specifically, the discrete-time system CD-© was simulated with the 
Kalman filter estimator ©-© and the switching policy for pilot power defined by the optimized 
free and vertical boundaries!] The comparison in Fig. |4] shows that the analytical results for the 
optimized free boundary underestimates the achievable rate by about 15-20%. The simulation 
and numerical optimization give nearly the same values with the optimized vertical boundary. 

In addition to the parameters N and p, which are the same as for the analytical results, for the 
discrete-time model another parameter is M, the number of samples per coherence block. Recall 
that the diffusion model is obtained in the limit as M/N — > 0, hence for fixed N the analytical 
results should be more accurate for smaller values of M (corresponding to higher correlations 
between successive channel gains). However, smaller values of M incur more overhead, since 
the training and channel state feedback occur each coherence block. (We discuss this further in 

7 The simulated results assume the optimized boundaries obtained in the diffusion limit, since the optimized boundary for the 
discrete-time system, given by the solution to dl lb . is much more difficult to compute. Additional simulation results have shown 
that the solution to d 1 1 b is quite close to the asymptotic (diffusion) boundary, and gives essentially the same performance. 
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Sec. [DO) The results in Fig. |4] correspond to M = 5. 



N = 1000, p =2, SNR = 7dB, £ =15 (11.76 dB) 

max 

3 1 1 1 1 1 1 1 1 1 1— 
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Coherence Block Size (M) 



Fig. 5. Comparison of training power values obtained from Monte Carlo simulations and analysis with different values of M. 
(N = 1000, p = 2 and SNR = 7dB) 



To see the effect of varying M with fixed N, Fig. \5\ compares the optimized average training 
power obtained via analysis and Monte Carlo simulations for different values of M with an 
SNR of 7 dB. As expected, the two curves grow apart as M increases, but are reasonably 
close for M < 5. Fig [6] compares the simulated steady-state pdf of the channel estimate ft with 
M = 5 with the asymptotic pdf (|34|) . The two curves nearly overlap. Further results show that 
achievable rates computed from the analytical model nearly match the simulated rates with the 
vertical boundary over a wide range of M, whereas for the optimized boundary the difference 
remains similar to that shown in Fig. HI 

Fig. 13 compares analytical and simulated results as a function of N with M — 1. For a fixed 
P av and p, as iV decreases, the SNR per sub-channel increases and the channel varies at a faster 
rate since the correlation across N channel uses is fixed at e~ p . The achievable rate increases 
with N due to the increase in training and channel state feedback. (In Sec. IVIIII we show that 
the rate increases as log N.) Again the analytical results closely match the simulated results with 
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SNR = 7 dB, p = 2, N =1000 and M = 5 
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Fig. 6. Channel estimate pdf fn(u) obtained from Monte Carlo simulations and the diffusion analysis. 

the optimized vertical boundary, and underestimate the achievable rate by 15 — 20% with the 
optimized free boundary^ The plots for average training power become close for N > 200. 



An important consequence of the optimal switching policy for pilot power control is that 
it requires no more than one bit feedback per coherence block. However, optimal data power 
control still requires infinite-precision feedback. Therefore to reduce the overall feedback rate, 
we now consider on-off data power allocation, which also requires at most one bit feedback per 
coherence block. (The feedback could be reduced further by exploiting the time correlation of 
the channel.) For the optimization problem (l36l) we therefore set 



The gap between the analytical and simulation results with the optimized free boundary is due to the fact that as TV" — > oo, 
the scaled rate objective increases without bound. This occurs even though from Theorem [2] the steady-state distribution of the 
system state converges to the exponential distribution. 



VIII. On-Off Data Power Allocation 




(44) 
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M = 1, p = 1, SNR = 5 dB, £ =15 (11.76 dB) 
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Fig. 7. Comparison of training power and achievable rates obtained from Monte Carlo simulations and the diffusion analysis 
with different values of TV. (M = 1, p = 1, SNR = 5 dB) 
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where the threshold fi Q will be optimized, and restate (1361) as 

poo 

max / NRlPo/N^^eiu^fftiujdu 



(45) 



-Po,/k>,0e(u) 

Subject to: gP + ^avg < Pav, 

and 0Ju) > 6\ for « > 



where fjx{u) is the steady-state pdf of the channel estimate given by (|34|) . e ai;9 = / °° e(u) fp,(u) du 
is the average training power and q = Pr{/t > /t } = //t(w) rfw. 



A. Harmonic Mean Objective 

Given a free boundary e (w) for u > 0, where e (u) G (6>*,cr 2 ), the objective in (|45T ) can be 



re-wntten as 



Wfr) = iVbg f 1 + ^> \ / (46) 



where we have used the fact that the total power constraint is satisfied with equality. Using (1341) . 
the probability of data transmission is 

Q= f f^ u ) du = ex P ( - I — — \i—; du J • ( 47 ) 

J Ho \ JO / 

Furthermore, the rate in (l46l) can be bounded as 

iVg log 1 + 7— r-j- ^— < i2on-offOo) < ^ log 1 + — , 

(48) 

where the upper bound follows by replacing 9 e (p) by 0, using Jensen's Inequality [34], and the 
fact that f°° u f^u) du < fa + o\. 



Observing that q is a decreasing function of fi , it can be shown that the upper bound d48j) is 

maximized with threshold /1q such that 

AS i / N \ 

du = log - 7\ r\ i (49) 



, al-e e {u) ta v (logA0 1+(5 / 

where 5 E (0, 1) is an increasing function of N. (An exact description of 5 is unecessary for the 
following analysis.) We can re- write ( |49l as 

N 

.(logAO 1 



Ao = ^(Ao) log ( Ani+g ) (50) 
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where H(jx£) = — — — is the harmonic mean of a\ — 8 e (u) for u E [0,/Iq], that is, along 

fn ° — n — " du 

the free boundary truncated at the threshold value /aq. Since H(pi^) 6 (0,cr^), (1501) implies that 
/to grows as log N. Substituting jx^ into (1481) . we observe that the upper and lower bounds have 
the same asymptotic growth rate, so that the rate (|46l) also has this growth rate, given by \ 



H^K/ffl « (logiV)^log(l + ( ^ og ^ ) (51) 

^ (Pgy — Cq,vg)£l>0 

x ( p ™- £ ™g) g (ft$) logA r (53) 

Since maximizes the upper bound in (|48T) . this is the growth rate of the achievable rate. 

We observe that this log N growth in achievable rate is the same as the growth in achievable 
rate for parallel Rayleigh fading channels (in frequency or time) with a sum power constraint 
and perfect channel knowledge at the transmitter (e.g., see [35]). This is because the coherence 
blocks correspond to separate degrees of freedom (i.e., the transmitter can choose whether or not 
to transmit over each block), and the number of coherence blocks increases linearly with N. For 
our model the associated constant is (P av — e avg )H(fi^), which accounts for channel estimation 
error, and depends on the channel correlation p. This product therefore determines the shape of 
the free boundary. (Note also that depends on the free boundary.) Namely, choosing boundary 
points closer to a\ reduces e avg , but also reduces the harmonic mean H(fi£), and vice versa. 
The optimal boundary balances e avg and H(jx£) by shifting training power from small values of 
ft to larger values, as discussed previously in Sec. IVII-CI 

B. Numerical Example 

Fig. [8] shows free boundaries at different SNRs obtained by solving the optimization problem 
(1451) numerically for iV = 200 and p — 1. Also shown are the optimized vertical boundaries with 
on-off data power control. As with water-filling, the free boundary is shaped to save training 
power when jl is small (high probability region) and re-distribute it to the instances when jl 
is large (low probability region). The boundaries shown here are more irregular, due to the 
discontinuous data power allocation. The shape of the boundary for jx > /}q is a straight line, 
but does not affect the objective since the rate depends on the harmonic mean for jx < 

9 The notation Fi(iV) x F 2 (N) implies that limjv-.oo f^fi = 1. 
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Free Bd. with On-Off power alloc, N = 200, p =1 




Estimation Error (6) 
Fig. 8. Optimal free and vertical boundaries with on-off data power allocation. 

Fig. [9] shows plots of achievable rates versus SNR with the optimized free and vertical 
boundaries and on-off data power control. Plots corresponding to the optimal waterfilling data 
power allocation are also shown for comparison. These results show that the performance with 
the optimized on-off power allocation are nearly the same as with water- filling. Also shown are 
the rates obtained via Monte Carlo simulations of the discrete-time system with the optimized 
boundary. Those are again higher than the rates calculated from the diffusion model, whereas 
the simulated rates with the vertical boundary closely match the analytical results. 

IX. Training Symbol Overhead 

So far we have ignored the time overhead due to the channel uses that are occupied by the 
training symbols. Here we restate the pilot power control problem taking this overhead into 
account. A switching policy for the pilot power requires that one of the M channel uses in a 
coherence block is a training symbol whenever the transmitter is directed to train. If the channel 
estimate for the coherence block is jl, then the probability of training (as discussed in Sec. rVTl) 
is given by where e(/x) = 2pcrz ^7^^ . Therefore the original optimization problem (|36l ) 
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Fig. 9. Achievable rate versus SNR with on-off data power allocation and optimized free and vertical boundaries for pilot 
power control. Also shown for comparison are the corresponding results with the water-filling data power allocation. 



can be reformulated, taking the training overhead in to account, by replacing the rate objective 
with 

max r (l-^L) N R{P(u)/N,u,9(u)}ff,(u)du. (54) 
(p(u),e(u)) j \ e max M j 

Of course, if either e max or M is large, then the training symbol overhead is negligible and 

the problem reduces to d36l) . Otherwise, the overhead term will influence the free boundary and 

ergodic rate. Specifically, it will reduce the optimal training power e(/t) (so that the boundary 

shifts towards 6 = afj, since the overhead penalty is proportional to the training power. 

Fig. [10] shows plots of the rate objective in (|54|) versus SNR with optimized free and vertical 

boundaries. For this figure N = 200 and M = 1, corresponding to a worst-case loss in throughput 

due to training overhead. Also, p — 1 and e max = 15 (11.76 dB). The data power control 

is assumed to be on-off and only the analytical results (obtained by maximizing (|54l) ) are 

shown. (Note that the channel state pdf is still given by Theorem [TJ) At low SNRs the average 

training power and associated overhead are small, so that taking the overhead into account 

does not significantly affect the rate. At high SNRs (around 10 dB) the training overhead 
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Fig. 10. Achievable rates taking pilot symbol overhead into account. 



reduces the achievable rate by about 15% with both the free and vertical boundaries. The 
percentage improvement provided by the free boundary relative to the vertical boundary remains 
approximately the same. 

A final remark is that when symbol overhead is taken into account, the throughput associated 
with the vertical switching policy is no longer the same as that associated with constant power 
control. That is because the switching policy requires on average e avg /e ma x < 1 channel uses 
for training every coherence block, whereas constant power control requires one channel use 
for training every coherence block. Of course, this savings in overhead for the switching policy 
comes at the cost of feedback. 

X. Conclusions 

We have studied achievable rates for a correlated Rayleigh fading channel, where both the 
data and pilot power are adapted based on estimated channel gain. In low SNR and fast fading 
scenarios the pilot power constitutes a substantial fraction of the total power budget, so that pilot 
power adaptation can provide a substantial gain in achievable rates. By taking a diffusion limit, 

August 10, 2009 DRAFT 



29 



corresponding to low SNRs (or wideband channel) and high correlation between consecutive 
channel realizations, several insights were obtained about the optimal pilot power control policy. 
Namely, it was shown that a policy that switches between zero and peak training power is 
optimal, and that the training power should be reduced when the channel is bad and increased 
when the channel becomes good. The optimal policy in the diffusion limit was also explicitly 
characterized, and shown to provide a significant increase in achievable rate for low SNRs and 
fast fading. 

For the discrete-time system of interest the switching policy is equivalent to maintaining 
constant pilot symbol power, but inserting pilot symbols less frequently when the channel 
estimate is weak (and vice versa). When combined with on-off data power control, this requires 
finite feedback, and achieves essentially the same performance with the optimal (water-filling) 
data power control. Of course, the CSI feedback required for optimal data and pilot power control 
can be substantially reduced by exploiting the correlation between successive coherence blocks. 

Several modeling assumptions have been made, which could be relaxed in future work. For 
example, we have assumed that the receiver knows the statistical model of the channel. In 
practice, the receiver may assume (or estimate) a model, such as ©, which is mismatched 
to the actual channel statistics. An issue then is how sensitive this overall performance is to 
this mismatch. Also, the first-order Rayleigh fading model might be replaced with other fading 
models (e.g., Ricean, Nakagami, and higher-order autoregressive models). 

Additional issues may arise when considering other channel models. For example, here we 
have imposed a power constraint, which is averaged over many coherence blocks. The results 
can therefore be directly applied to parallel fading channels where the total power constraint is 
split among the channels. However, for a frequency-selective channel the total power summed 
over parallel channels might instead be constrained per coherence block. Other extensions and 
applications of diffusion models to Multi-Input Multi-Output (MIMO) and multiuser channels 
remain to be explored. 

Appendix I 

Continuous-time limit of discrete-time processes ©, © and © 
Substituting r = 1 — p5t into © and ignoring terms with higher power of St, we obtain 

hi+i - K = -p 5t hi + \j2p5t Wi (55) 
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In the diffusion limit the noise yStWi can be modeled as dB(t), where B(t) is a standard 
complex Brownian Motion. Hence as St — > 0, the preceding equation becomes (fi"3l) . 
Substituting r = 1 — p St in © gives 



i+ i|i = (1 -2p5t) ^ + 2p5ta fe 2 . 

Replacing by et/N in dU) and substituting St for M/A 7 " gives 

St, 



'I 



Or. 



Combining (1561) and (1571) , ignoring the (St) 2 term, gives 

e i+1 -9i = 2pSt [a h 2 -9 i 



9 i+ i9i6i St 



(56) 



(57) 



(58) 



which becomes (1161) as St — ► 0. 

Substituting for r and replacing by e^/iV, © can be re- written as, 



i+l 



hi 



-phi St 



St(h i+ i — hi — pSt hi) + a/ 7i St a 2 n 



i+l 



(59) 



where, is a zero mean unit variance CSCG random variable independent of Wi. The term 



e-, 



-^x ti St{h i+1 — hi) is a CSCG random variable with mean zero and variance e 2 E[\h i+ i — 
hi\ 2 ](St) 2 and hence can be ignored. Modeling \fSin i+ i as dB(t), where B(t) is a standard 
complex Brownian motion independent of B(t) gives (fT51) as <5t — > 0. 



Appendix II 
Proof of Lemma Q] 

Defining the 3 x 1 state vector as G(t) = [h r (t), hj(t), 9(t))\ (fT5l) and (fl6l) can be re-written 



as, 



dG(t) = D[h r {t), hj{t),9{t)} dt + V[h r {t), hj(t), 9{t)]B{t) 
where the drift and variance are given by 



D(h r ,hj,9) 



e9 2 

-ph r , -phj, -2p9 - + 2pa h 



V(h r ,hj,9) = diag 



eJJ^, 



(60) 

(61) 
(62) 



2af V2a| ; 

respectively, the dependence on time is dropped for notational convenience, and the three entries 
of the vector B(t) are independent, real- valued, standard Brownian motions. 
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From [25, Theorem 5.2.1], given (|63l) and (1641) . the solution to (1601) exists and is continuous 
in t provided that the following two conditions are satisfied: 



|D(/i n hj, 9)\ + \V(h r , hj, 9)\ < d(l + \jhl + ft 2 + 6 2 ) (63) 
ID^!,^-!,^) - D {h r 2,h j2 , 9 2 ) | + |V(A rl ,^i,6li) - V(/z r2 , /i i2 , 2 )| 



< c 2 ^(/i rl - /i r2 ) 2 + (hj! - h j2 y + - e 2 )\ (64) 

where for any matrix M with (k,l) entry M-ki, |M| = yj^ M| z , and Ci,C 2 are constants. 
Condition (1631) is called the linear dominance property and (|64|) is called the Lipschitz property. 
Given < e < e max , we have 



|D| + |V| = \lp2h 2 r + p 2 h?+(-2 P e-^ + 2 P a h 2 ) +J 6 -^ (65) 



< \ 2P + 



2\ 2 



/l? + /i 2 + 02 



e 02 



+ 4pV* + A /^- (66) 



32 



< ( 2p + f^T) Jfy + fc. +02 + ^/^+ J^-. (67) 



so that (l63l) is satisfied. Similarly for 9\ > 6 2 we have 

\B(h rl ,hj!,9!) -B(h r2 ,h j2 ,9 2 )\ + |V(/i rl , h jlt Oi) -V(h r2 ,h j2 ,9 2 ) 



< \ p 2 Chn-h r 2) 2 + p 2 Chji-h, 2 ) 2 + 



9 21 2 

(J 2 



n -- 2 ) 2 + ,/^(0!-0 2 ) 2 . (68) 



so that is (l64l) satisfied. Since the solution to (fl"8~l) and (fl6l) . 5*(t) = (jl(t),9(t)), is a continuous 
function of G(t), it must also be continuous in t. 

Appendix III 

Alternative Derivation of Continuous-Time Bellman Equation (1251) 
We first rewrite the discrete-time Bellman equation (flTT) as 

C = max {R(P, p,, 9) - A (e + P) + ^, (A , e) [1/] - l/(/2, 0)} , (69) 

where 

/>oo 

- HM) = / [^Mi+i) - V(£, 0)] WisiWdw. (70) 
Jo 
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Assuming that V(p, 9) is a continuous and smooth function, we can expand V around (/}, 9) 
via the Taylor series 

v(u, e i+l ) - v(ft, e) = ^(u-fi) + - e) 



i 

+ » 



<9 2 \/ o <9 2 v 9 2 y 



+ higher-order terms (71) 

where all the derivatives are computed at (p, 9). As stated in Sec. Hill /t i+1 conditioned on Si is 
Ricean, so that 



\ / 



u > 



where J (') is the zeroth-order modified Bessel function of the first kind and 



eM9 



On 



9i+l 



(72) 



(73) 



where 9 i+ i and 9 i+ \\i are given by (|4]) and (|7]), respectively, with 9 i: replaced by The first two 
moments are [36, Ch. 2], 



E[fi i+1 \(ji,e)] 

E[p 2 +1 \(p,9)] = 2a 4 



2-, 2 

r p + cr c 



(74) 
(75) 



Next we take the diffusion limit. Substituting r = 1 — p5t in (|73l) and replacing e by e/iV 
gives 



fl 2 e 



0" 2 



(<Jt) + 0(5t 2 



Making these substitutions in (I74l)-(r75l) gives 

E[p i+1 - p,\(ji,0)] = 

E[(p l+l -p) 2 \(p,9)] = 



-2 P fi + e 2 — 



a 



2p9 2 



5t + 0{St 2 ) 
5t + 0{5t 2 ) 



(76) 



(77) 



(78) 



It is easily shown that the higher-order moments E[(jli + i — p) n \(p, 9)] < 0(St 2 ) for n > 2, 
hence we can ignore the higher-order terms in (|7TT) . 

Substituting 9i = 9 and taking the diffusion limit, © can be re-written as 



-2p9 + 2pa h 2 -9 2 — 



St 



(79) 



August 10, 2009 



DRAFT 



33 



Substituting CZD into (0 and combining with C77])-(|79]) gives 

dV 



E eAm [V]-V{p,9) 



dp 



-2 P fi + r 



St 



+ 



dV 
~89 



-2p9 + 2pa h 2 - 9 2 



Or. 



St + 



d 2 V 
dp 2 



e 2 - 



St (80) 



Lastly, C, R, e and P can be multiplied by St without changing the original optimization problem. 
Applying this scaling in (|69l) , substituting (f80l) into (|69l) , and multiplying the entire equation by 
1/<S£ and letting St -> gives (l25l) . 



Appendix IV 
Proof of Proposition CD 

Define the distance from the free boundary at time £ as «;(£) = 6>(£) — 9 e (p(t)). Irrespective of 
the initial state, due to the drift term in (fT6l) and time continuity of the state process (Lemma [I]), 
with probability one there exists a finite time instant such that the state lies on the free boundary. 
Without loss of generality, rename that instance as £ = so that k(0) = 0. For any rj > let 
£i = inf{£ > : n(t) < —i]}. By continuity of the state process, «(£i) = — i] and there exists a 

£ = sup{£e(0,£i):/<£o) = -7//2}. 

If k(£) < 0, bang-bang control implies that e(£) = and the dynamical equations (fT6l) and 
(IT8T) simplify to dO = 2p{a\ - 9)dt and dp = -2pp dt. Then 

d6 t (p) 



dn = 2p {ai - 9) - ft dt, (81) 

and since «(£) < implies 9 < 9 e (p), we have dn > 2p (o\ — 9 e ) — p^f- dt. Thus given 
condition (|30l) . we have dn > whenever k(£) < 0. However, this contradicts the fact that 
dn = k(£i) — k(£o) = — 1)12. Therefore we cannot have a £ x < oo, which implies that 
lim^oo Yx{9(t) < 9 t (p(t)) - r]} = for any r] > 0. 

Next we show that lim^oo Pr{6'(£) > 9 € (p(t)) + 77} = 0. For any continuous and twice 
differentiable function W(9, p) we must have [25, Ch. 7] 



E{A e [W(9,p)}} = 0, 



(82) 



where the expectation is over the steady-state distribution of the state (9, p). The generator A e [-) 
is defined as in (1211) . except that the function V(-) is replaced by W(-). We choose W(-) to be 
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a function of 9 only, i.e., W(8,fi) = W(9), so that 

A e [W] = A e [W(6)} = W'{9) 



- 



e9 2 



a: 



(83) 



where W{9) denotes the derivative of W(9) with respect to 9. Let p(u) denote the steady-state 
probability of training given that the channel estimate is u, as in (1331) . Rewriting (l82l) as 



[A e [W(9)}} = 0, 



(84) 



and evaluating the inner conditional expectation using the preceding result that Pi{9 < 9 e (p) — 
r]\jl} = for any 77 > gives 



Ejx {(1 — p(p))W'(9 e (p)) [2p(a 2 h - 9 e (p))} } 



A;0>e e (/i) 



W\9) 2p{a\ -9)- 



32 \ 1 



at 



(85) 



where E, 



e\p,;e>e e (p,)[ 



denotes the expectation over 9 given that the estimate is p and 9 > 9 e (p). 



Choosing W{9) such that W'{9) = 1/ 2p(a 



n2 

t- rn a x " 



and substituting in (|85l ) gives 



En 



2 P (^-e e (/i)) 



2pK-0 e (/2)) 
which implies 

£ A [(1 - p(fL))2p(<x 2 h - e (/i))] = E^ p{p) 



emaxde(jl)' 2 
^2 



+ £ A [p(/*)] = 0, 



(86) 



at 



2p(a 2 h -9 € (p)) 



Next choose W{9) =9 so that W'(9) = 1. For this choice ([85]) gives 



£ A [(l-p(/})) [2p(a 2 h -9 e (p))]]=Ec 



p(p)E ( 



at 



2p{a\ 



(87) 



(88) 



We now argue by contradiction that for any 77 > 0, Prj^ > 9 t (p)+r]\p} = almost everywhere 
(a.e.) in the set M. = {ft : p > 0}. If this were not the case, then we must have p(p) > over a 
subset in M. with positive measure. Since 6m °f e2 — 2p(cr£ — 6>) is a positive increasing function 
of 9, (EE) implies 



£? A [(i-p(/*))2p(^-e e (/i))] >^ 



<7f 



2pK - 9 e (p)) 



(89) 



with strict inequality, which contradicts (1871) . Hence this establishes the proposition for any 

Tj > 0. 
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Appendix V 
Proof of Theorem [2] 

As in Appendix [TV] we use the fact that for any continuous and twice differentiable function 
W{9,jj l ), we have [25, Ch. 7] 

E{A e [W(9,fi)}} = 0, (90) 

where the expectation is over the steady-state distribution of (9, (i) and the generator A e [] is 
given by d2U> with V(-) replaced by W(-). Choosing W(ft, 9) = W x (9) in d90j to be a function 
of 9 only and applying Proposition Q] gives 

Efi [W{(6 e (jl))g(ji)} = (91) 

where 



g{p) = (1 - p{mp{°l - UP)) + P(A) 



2pK-^(/i))- 



^max9e (A) 



(92) 



Next we observe that g(jl) = a.e. in the set M. = {ft : fi > 0}. If this were not the case, 
then since 9 e (x) is a one-to-one function, we could choose W\{9) such that W[(9 t ((ij) = g((i), 
which would make the left-hand side of (19T1 ) strictly positive. Therefore setting g((i) = gives 
the steady- state probability of training given fi shown in (1331) . 

We now solve for the steady-state pdf fji(u). Choosing W(9 1 fi) = W 2 {fi), a continuous and 
twice differentiable function of (i only, and applying the generator (|2TI) gives 



A e [W 2 ] = -2pjlW^) + [W^iX) + pWZ(fi] %. (93) 
The necessary condition (l90l) can now be written as 

/"OO 

/ [C(u)W^u) + D(u)W%(u)] U(u)du = 0, (94) 
Jo 

where 

C{u) = -2pu+ [6e ^l emax P (u) and D^^ MI^ ^), (95) 
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We can further choose W 2 (u) to satisfy the following properties: 

W 2 (0) = (96) 
lim C(u)U(u)W 2 (u) = (97) 

U— >OQ 

lim D{u)fp.{u)W^u) = (98) 

u— >oo 

lim d[D{u y^ u)] W 2 (u) = (99) 







and using integration by parts we can re-write (1941 ) as 
Since this condition must be satisfied for any such W 2 (-), we have 
Substituting (1331 into (11011) gives the differential equation 



W 2 (u) ( _[D( W )/ A ( W )] - — [C(u)/ A (it)] ) dtt = 0. (100) 



" i-[C(«)//i(«)] = 0, a.e. « > 0. (101) 



^l2p{al-6 e {fi))ff,{u)\-— U2p(al-e £ (u))-2pu}f fi (u)j=0, a.e. M >0 (102) 
which can be further simplified as 

2p[o\ - *«(«)]«^ + 2pu (l - ™M\ U(u) + K = 0, (103) 
where K is a constant. This is a first-order ordinary differential equation with solution 

U(u) = -K exp[-f («)] £ 2p ^ [ 3e]t)] t dt + Kl exp [~ J ^)]' (104) 

where 

r l-d6Jt)/dt , , i#veN 

' M = X ^Fw * <105) 

<>0 "e^J 

and i^! is another constant, which needs to be determined. Since fp,(u) is a pdf, we must have 
lim^oo ffi(u) = 0, which implies K = 0. This is because the first integral in (11041) is unbounded, 
that is, 

In addition we must have J °° fp,(u)du = 1, which implies ffx = — £ (0)). Substituting 

these values into (11041) gives (|34|) . 

August 10, 2009 DRAFT 



37 



Appendix VI 
Derivation of (I39l-(l40l) 



First we fix the free boundary 9 e (jl) and optimize the data power allocation. For any ft > 
setting the derivative of the objective function (1381) with respect to P(/t) to zero gives the optimal 
power allocation P*(/t) = NP d (fi,8 e (fl), A). Substituting this P*(p) into (1381) and taking the 
derivative with respect to 9 e (p) gives the optimality condition 



/a (A) 



- I- A [P*(A), A, W)] ^, + [o 2 _ g e(/1)]2 / L x [P*{v),vMv)\h{v)dv. (108) 

Note that P*(/i) = for jl < \a\ so that = 0. For /2 > Aa^ we have f^[P*(/2), /}, e (/2)] 

0. Therefore (11081) reduces to (|40"1) with 6> £ replaced by The additional constraint £ (/t) > 9* 

implies (l39l) . 

Appendix VII 

Free Boundary Problem as a Quadratic Optimization 
We first observe that (T26l) can be written as the variational inequality [28] 

C - J -a > 
C — J — a — e max (b - A) > 
(C — J — a)(C -J -a- e max (b - A)) = (109) 

A solution to (11091) is a solution to (l26l) and vice versa. Now consider the following optimization 
problem, 

min wq / t>i t>2 dd du + t«i / [(d x Vi) 2 + (d x v 2 ) 2 ] d9du 

Subject to : C — J — a = v 1 >0 

v\ - t max {b- A) = v 2 > (110) 

where d x Vi = j^dx for x G X = {/a, 6} and i = 1,2. If w > 0, 1% = 0, and wp_ = 0, then 
the solution to (11091) is a solution to (|1 101) . Also, a solution to (II 101) with zero objective value 
is a solution to (I109I ). The second term in the objective function is included to regularize the 
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numerical solution. The effect of this term can be controlled by changing the weights wg and w^. 
These weights affect both the accuracy of the results and also the rate at which the non-linear 
optimization algorithm converges. The training region is where V\(jl,9) > 0. Therefore the free 
boundary can be obtained by solving (II 101) numerically given values for A and p. 
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